Overview

Dataset Statistics

Number of Variables 14
Number of Rows 1309
Missing Cells 3855
Missing Cells (%) 21.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 431.3 KB
Average Row Size in Memory 337.4 B
Variable Types
  • Categorical: 11
  • Numerical: 3

Dataset Insights

age has 263 (20.09%) missing values Missing
cabin has 1014 (77.46%) missing values Missing
boat has 823 (62.87%) missing values Missing
body has 1188 (90.76%) missing values Missing
home.dest has 564 (43.09%) missing values Missing
fare is skewed Skewed
name has a high cardinality: 1307 distinct values High Cardinality
ticket has a high cardinality: 929 distinct values High Cardinality
cabin has a high cardinality: 186 distinct values High Cardinality
home.dest has a high cardinality: 369 distinct values High Cardinality
pclass has constant length 3 Constant Length
sibsp has constant length 3 Constant Length
parch has constant length 3 Constant Length
embarked has constant length 1 Constant Length
survived has constant length 1 Constant Length
  • 1
  • 2

Variables


pclass

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 89012
  • The largest value (3.0) is over 2.2 times larger than the second largest value (1.0)

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 1.0
2nd row 1.0
3rd row 1.0
4th row 1.0
5th row 1.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 2618
  • The top 2 categories (3.0, 1.0) take over 50.0%
  • The largest value (30) is over 2.2 times larger than the second largest value (10)
  • pclass has words of constant length

name

categorical

Approximate Distinct Count 1307
Approximate Unique (%) 99.9%
Missing 0
Missing (%) 0.0%
Memory Size 120595

Length

Mean 27.1276
Standard Deviation 9.5039
Median 25
Minimum 12
Maximum 82

Sample

1st row Allen, Miss. Elisa...
2nd row Allison, Master. H...
3rd row Allison, Miss. Hel...
4th row Allison, Mr. Hudso...
5th row Allison, Mrs. Huds...

Letter

Count 28224
Lowercase Letter 22841
Space Separator 4040
Uppercase Letter 5383
Dash Punctuation 19
Decimal Number 0
  • name contains many words: 1947 words
  • The largest value (mr) is over 2.93 times larger than the second largest value (miss)

sex

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 12013
  • The largest value (male) is over 1.81 times larger than the second largest value (female)

Length

Mean 4.712
Standard Deviation 0.958
Median 4
Minimum 4
Maximum 6

Sample

1st row female
2nd row male
3rd row female
4th row male
5th row female

Letter

Count 6168
Lowercase Letter 6168
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (male, female) take over 50.0%
  • The largest value (male) is over 1.81 times larger than the second largest value (female)

age

numerical

Approximate Distinct Count 98
Approximate Unique (%) 9.4%
Missing 263
Missing (%) 20.1%
Infinite 0
Infinite (%) 0.0%
Memory Size 16736
Mean 29.8811
Minimum 0.1667
Maximum 80
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • age is skewed right (γ1 = 0.4071)

Quantile Statistics

Minimum 0.1667
5-th Percentile 5
Q1 21
Median 28
Q3 39
95-th Percentile 57
Maximum 80
Range 79.8333
IQR 18

Descriptive Statistics

Mean 29.8811
Standard Deviation 14.4135
Variance 207.749
Sum 31255.6667
Skewness 0.4071
Kurtosis 0.1405
Coefficient of Variation 0.4824
  • age is not normally distributed (p-value 0.00012205737278112343)
  • age has 9 outliers

sibsp

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Memory Size 89012
  • The largest value (0.0) is over 2.79 times larger than the second largest value (1.0)

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 0.0
2nd row 1.0
3rd row 1.0
4th row 1.0
5th row 1.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 2618
  • The top 2 categories (0.0, 1.0) take over 50.0%
  • The largest value (00) is over 2.79 times larger than the second largest value (10)
  • sibsp has words of constant length

parch

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Memory Size 89012
  • The largest value (0.0) is over 5.89 times larger than the second largest value (1.0)

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 0.0
2nd row 2.0
3rd row 2.0
4th row 2.0
5th row 2.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 2618
  • The top 2 categories (0.0, 1.0) take over 50.0%
  • The largest value (00) is over 5.89 times larger than the second largest value (10)
  • parch has words of constant length

ticket

categorical

Approximate Distinct Count 929
Approximate Unique (%) 71.0%
Missing 0
Missing (%) 0.0%
Memory Size 93974

Length

Mean 6.7907
Standard Deviation 2.7695
Median 6
Minimum 3
Maximum 18

Sample

1st row 24160
2nd row 113781
3rd row 113781
4th row 113781
5th row 113781

Letter

Count 1026
Lowercase Letter 25
Space Separator 364
Uppercase Letter 1001
Dash Punctuation 0
Decimal Number 7032

fare

numerical

Approximate Distinct Count 281
Approximate Unique (%) 21.5%
Missing 1
Missing (%) 0.1%
Infinite 0
Infinite (%) 0.0%
Memory Size 20928
Mean 33.2955
Minimum 0
Maximum 512.3292
Zeros 17
Zeros (%) 1.3%
Negatives 0
Negatives (%) 0.0%
  • fare is skewed right (γ1 = 4.3627)

Quantile Statistics

Minimum 0
5-th Percentile 7.225
Q1 7.8958
Median 14.4542
Q3 31.275
95-th Percentile 133.65
Maximum 512.3292
Range 512.3292
IQR 23.3792

Descriptive Statistics

Mean 33.2955
Standard Deviation 51.7587
Variance 2678.9597
Sum 43550.4869
Skewness 4.3627
Kurtosis 26.9202
Coefficient of Variation 1.5545
  • fare is not normally distributed (p-value 6.153624508000216e-18)
  • fare has 171 outliers

cabin

categorical

Approximate Distinct Count 186
Approximate Unique (%) 63.0%
Missing 1014
Missing (%) 77.5%
Memory Size 20278

Length

Mean 3.739
Standard Deviation 2.3466
Median 3
Minimum 1
Maximum 15

Sample

1st row B5
2nd row C22 C26
3rd row C22 C26
4th row C22 C26
5th row C22 C26

Letter

Count 356
Lowercase Letter 0
Space Separator 61
Uppercase Letter 356
Dash Punctuation 0
Decimal Number 686

embarked

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.2%
Missing 2
Missing (%) 0.2%
Memory Size 12045
  • The largest value (S) is over 3.39 times larger than the second largest value (C)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row S
2nd row S
3rd row S
4th row S
5th row S

Letter

Count 1307
Lowercase Letter 0
Space Separator 0
Uppercase Letter 1307
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (S, C) take over 50.0%
  • The largest value (s) is over 3.39 times larger than the second largest value (c)
  • embarked has words of constant length

boat

categorical

Approximate Distinct Count 27
Approximate Unique (%) 5.6%
Missing 823
Missing (%) 62.9%
Memory Size 32312

Length

Mean 1.4856
Standard Deviation 0.6476
Median 1
Minimum 1
Maximum 7

Sample

1st row 2
2nd row 11
3rd row 3
4th row 10
5th row D

Letter

Count 83
Lowercase Letter 0
Space Separator 11
Uppercase Letter 83
Dash Punctuation 0
Decimal Number 628

body

numerical

Approximate Distinct Count 121
Approximate Unique (%) 100.0%
Missing 1188
Missing (%) 90.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 1936
Mean 160.8099
Minimum 1
Maximum 328
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • body is skewed right (γ1 = 0.0906)

Quantile Statistics

Minimum 1
5-th Percentile 16
Q1 72
Median 155
Q3 256
95-th Percentile 307
Maximum 328
Range 327
IQR 184

Descriptive Statistics

Mean 160.8099
Standard Deviation 97.6969
Variance 9544.6886
Sum 19458
Skewness 0.0906
Kurtosis -1.252
Coefficient of Variation 0.6075

home.dest

categorical

Approximate Distinct Count 369
Approximate Unique (%) 49.5%
Missing 564
Missing (%) 43.1%
Memory Size 62703
  • The largest value (New York, NY) is over 4.57 times larger than the second largest value (London)

Length

Mean 19.1651
Standard Deviation 8.8731
Median 17
Minimum 5
Maximum 50

Sample

1st row St Louis, MO
2nd row Montreal, PQ / Che...
3rd row Montreal, PQ / Che...
4th row Montreal, PQ / Che...
5th row Montreal, PQ / Che...

Letter

Count 11580
Lowercase Letter 8777
Space Separator 1645
Uppercase Letter 2803
Dash Punctuation 19
Decimal Number 0

survived

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 12005
  • The largest value (0) is over 1.62 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1309
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 1.62 times larger than the second largest value (1)
  • survived has words of constant length

Interactions

Correlations

Missing Values